-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: resolve data race in Broadcasters system #153
Conversation
**Requirements** - [x] I have added test coverage for new or changed functionality - [x] I have followed the repository's [pull request submission guidelines](../blob/v5/CONTRIBUTING.md#submitting-pull-requests) - [ ] I have validated my changes against all supported platform versions * What does "platform" mean here? Go version? OS version? Happy to help test, just not sure what I'm testing! **Related issues** I hit data races when running tests in parallel, which effectively obscures other data races my application may have. It looks like others did too, judging by #102. Fixes #102 **Describe the solution you've provided** I've addressed 2 data races with improved locking: * `HasListeners()` had a data race due to any mutation on `b.subscribers` * `Close()` had a data race where closing a send channel triggers a panic in `Broadcast()` I also saw an easy opportunity to use more fine-grained locks with an RWMutex, although I'm happy to back that out if you would prefer. **Describe alternatives you've considered** I also considered using an atomic data type for subscribers, but I figured that change would be less of a surgical fix. It also may be more difficult to mutate the slice, since a compare-and-swap can fail and would need a loop (it's not as simple as `atomic.Add`). Another idea which may be simpler is using a channel to manage shutdown. Typically I'd use a `context.Context` here to manage cancellation. That'd also prevent `Broadcast()`ing on a full send channel from blocking `Close()`. **Additional context** Add any other context about the pull request here.
This is a small cleanup to the existing Broadcasters implementation to use `slices.DeleteFunc`.
for _, ch := range ss { | ||
ch.sendCh <- value | ||
} | ||
b.lock.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there is a behavioral change here, which may or may not be relevant. But previously during the broadcast you could remove a listener (the lock wasn't retained during the actual broadcast), and now you wouldn't be able do get the exclusive lock during that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, we probably shouldn't hold the lock during the broadcast as that could take an unbounded amount of time to complete. I'll revert that and add a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so there's a problem here: if we allow removing a listener while Broadcast is running concurrently, then it could try to send to a channel that is closed (due to removing the listener.) This is a panic.
So that behavior was already bad.
I'm thinking it's better to say "if you aren't draining your receivers, you're not gonna be able to add or remove receivers" which would be the case if we locked around the broadcast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reverting to locking around the broadcast. I'll note in the release that this may affect the use-case of adding/removing a listener while broadcasting.
This test is fundamentally flawed in that the order of executing the broadcasters is important. We can't execute them in random order without causing deadlock, as receivers need to drain their channels for broadcast to unblock.
eaa33b9
to
93c1e39
Compare
🤖 I have created a release *beep* *boop* --- ## [7.4.1](v7.4.0...v7.4.1) (2024-06-04) ### Bug Fixes * add warning log message if client init duration is longer than recommended (60s) ([68e3440](68e3440)) * align high timeout log msg with spec ([#150](#150)) ([606d224](606d224)) * resolve data race in Broadcasters system ([#153](#153)) ([68cb1a4](68cb1a4)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This resolves a data race in the Broadcaster implementation, specifically
HasListeners()
was not protected by a lock.Additionally it swaps out the mutex for a
RWLock
so that concurrent readers don't block each other.A behavioral change is that the
Broadcast
method now locks the subscribers array, so thatAdd
orRemoveListener
cannot be called concurrently.